Git is a version control system that makes collaboration and makes roll backs simpler. It’s like Google Doc’s Version History but better for coding environments, especially when teammates need to work on separate parts of a project. Typically, each project lies in their own repository–often shortened to repo–which is basically the folder that holds all your source code, supporting documents, and history of your work. The repo is central to working with git. On your local computer, you’ll have a local repo to store your work and history. With integration with GitHub or a similar hosting service, you can store a copy in the cloud and collaborate with a team. There, you can see what changes others have made and merge them into one version.
We will be dealing with just the basics of Git and GitHub, but you still can do a lot with these two tools. We will also be using Git tools that come with RStudio to make life easier. Everything shown here can also be done using the command line if you are already familiar and prefer.
As mentioned above, your project sits in a repository, or repo. This repo keeps track of your version history and branches. We don’t have to worry about branches in this class since we’re dealing with small and simple projects. Each “version” of your project is set by a commit. A commit is a snapshot of the edited files within the repo. All the commits chain together to create the version history for your project.
Locally, each person has a copy of commits on their computer. On GitHub, there is a remote copy of the commit history. To first get access to the repo, you clone the remote repo to make a local copy. When you make changes to your project, you will add the changed files then commit them to your local repo chain (sometimes these two steps are combined into one). After this, you will push the commit to the remote repo so everyone can have access to your work. For others to get your changes, they will pull from the remote repo. If there ever is conflicting work on the local and remote copies, Git will attempt to merge the changes together to form a new complete commit; sometimes you may have to manually assist the merge.
With these six operations, you can use Git effectively.
Before we can start utilizing Git, we need to download the tools, create accounts, and deal with settings.
Tools drop down menu and select
Global OptionGit/SVN tabYou can ignore SVN and RSA/SSH, as those aren’t needed for this class. Your settings should look something like this:
RStudio’s Git Settings
Now that we have Git installed, we need to set some identifying information so others can know who wrote your commits. The email should be the same as your GitHub account email. The name can be your first name, full name, username, or any other identifier. I suggest you use your name for this class.
To set these settings, we need to open the terminal. There are a couple of ways to do this:
If on Mac: Search for and open the ‘Terminal’ app
Terminal on Mac
If on Windows: Search for and open the ‘Command Prompt’ app
Terminal on Windows
Within RStudio, either OS: Click the ‘Terminal’ tab near the bottom, next to the ‘Console’ tab
Terminal within RStudio
Once in the terminal, type in these two commands, putting your name and email, and press enter after each:
git config --global user.name "John Doe"
git config --global user.email "johndoe@example.com"
In the past, all you needed was your username and account password to work with GitHub. However, that was a single point failure and a security risk. Now, you need to create a special “password” separate from your login credentials to push and pull from GitHub. These come in two flavors: a personal access token for HTTPS or an SHH Key for SSH (HTTPS and SSH are two different protocols for communication over the web). HTTPS and personal access tokens have a simpler set up, so that’s what we will be using for this class.
To create a personal access token, first go to GitHub and then account settings:
GitHub Account Drop Down
Then select Developer settings way at the bottom:
GitHub Account Settings
Within Personal access tokens /
Tokens (classic) and under Generate New Token,
select Generate new token (classic):
GitHub Developer Settings
Choose a descriptive name for your key so you know its purpose later. Something like “<class number> personal laptop” would be good. It’s best to have separate token for each task and at the very least, each computer/system you use. This way, you can delete specific keys if they are compromised or are no longer needed.
Choose an expiration so that the date is after class ends. That way, you won’t have to make a new one. For security, I would recommend not making it infinite.
Most importantly, select at least “repo” under the Select Scopes section. This allows us to use the key we are creating for push and pull operations. Your settings should look something like this:
Personal Access Token Creation
Click Generate Token. The output will look like random
sequence of numbers and letters. Copy this token and store it somewhere
secure, such as a passwords manager.
DO NOT PUSH THE TOKEN TO GITHUB! If you do, anyone could have access to your GitHub account.
If you do happen to push the token to GitHub, delete the token from your repo and delete it from your account. Create a new token afterwards. Also, if you lose the token, there is no way to recover it. Instead, you should delete the token from your account and create a new one.
If you ever need to delete a token, return to this page, find the
correct token using the description you provided at creation, and then
click the Delete button.
Remember where you stored your token. You will need this token for many Git operations.
Never push any secrets to GitHub, private or not. Repositories are not a secure location, even in a private repo, for confidential or high security information. Secrets should be encrypted and have limited access; repos offer neither. That means no keys should be written in your source code. If you ever accidentally commit a secret, purge the commit from your history before pushing. You can still purge a commit after pushing, but it is much harder to do so, and the secret should be changed after that point anyway. GitHub will often warn you if it detects something that looks like a private key on it’s servers.
If you ever need to share secret information with a teammate, such as an API key, do it over more secure channels and store a copy locally using environment variables. If in the future you need to deal with a large team and many secrets, GitHub does offer secure ways to share secrets: learn more here.
Now you should be all ready to start utilizing Git and GitHub.
The first step to using Git is to create a remote repository to store
and share your project. We are using GitHub for this purpose. To start,
sign into GitHub and choose to create a New repo (it should
be a green button either to the left or right of the home page).
After all the settings are set, the page should look something like this:
GitHub Repository Creation
You can now click the Create repository button.
Congratulations, you’ve created your first remote repository. Now we need a local repo to actually work on our project.
To do this, we will clone our project. One way to do this is
using RStudio’s integration with Git. R Projects are an easy way to to
work with Git. In R Studio, create a new project by selecting
New Project... under the File drop down menu.
When the New Project Wizard pops up, select
Version Control:
New Project Wizard
Then select Git:
Selecting Git Version Control
For the next step, we need to go back to GitHub and the repo we
created in the last section. In the repo’s home page, click the green
Code button and copy the HTTPS URL:
The Repo’s URL
Once copied, return to the New Project Wizard and paste the URL under the appropriate box. Also give your directory a name; it can be the same as the repository’s name or different. Lastly, choose the directory you want your project stored in. Your Wizard should look similar to this:
Cloning the Repo
When done, click Create Project. Congratulations, you’ve
cloned your remote repo and are now ready to start adding code.
So now you have your remote repo and local repo, but what do you do with them? You can start working as normal, such as making your first RMarkdown files. If you are working in a group, only one of you need to create the initial files. Let’s create a new RMarkdown file where we’ll do all our work. We’ll keep the temporary starting code for now.
OK. Now we have some work done and want to share it with your
teammates. To start, save your work. Next, we want to commit
our work. Find the GIT drop down menu and select
Commit...:
Git Drop Down Menu
You will now be provided with a staging menu. Here you can view the files that you’ve made changes to and what those changes were. Select the files you want to be staged, i.e., the files you want to be added to the commit. It is important to write a small but meaningful message to your commit. There are a few reasons why:
In this example, I have yet to do any work; I’ve only made the starting files. So, in my message, I’ll call it “initial commit”. This is usually the message you give when just adding a readme or starting files. Your staging menu should look something like this:
Staging a Commit
Once it looks good, press the Commit button. You’ll get
a popup, which just verifies the commit action. It will say how many
files were changed, the amount of insertions and/or deletions, and then
which files were changed. You can close the popup by pressing the
Close button:
Commit Confirmation Popup
Now the changes are committed to our local rep. If we needed to, we
could roll back to a previous commit if we introduced errors. We are
looking good, so our next step is to push our commit to the
remote repo so everyone else can get our work. Notice how our staging
menu is saying our branch is ahead of ‘origin/main’ by 1 commit. This is
saying we have a local commit that is not in the remote repo. In the
same staging menu, press the Push button:
Pushing our Commit
You’ll get a new popup asking for your username. Type your GitHub
account username here, then press OK. Next it will ask for
your password. This is not your account password; instead, it is the
personal access token we made earlier. Find where you securely stored it
and copy to here. Press OK.
You’ll then get a confirmation that the commit worked. The confirmation is giving the hash of the commit(s), which is a commit’s unique identifier; the origin of the commit; and where it is pushing to. Here, we have the commit originating from our local repo pushing to the main branch of the remote repo.
You can now close out of the popup and the staging window.
At this point, if you return to GitHub and reload your repo’s page, you’ll see the new files and our commit message next to them. Note that the files that were created for us by GitHub also have “Initial commit” as the message.
GitHub.com Showing our Commit
Once the initial files are pushed to GitHub, all your teammates can now clone the library and begin contributing.
Let’s set up a scenario. You’ve invited your partner Pat to this repo, and he’s cloned the repo following the steps above. At this point, you both have the same starting files that we pushed in the last section. Pat is excited to get working, so he deletes the started code, adds a new section to the .Rmd file, and knits it. The output looks like this:
The HTML Output after Pat’s Changes
Pat follows the steps above to commit and push the changes. When he does so, he sees the file git_demo.Rmd has been modified as noted by the status “M” and git_demo.html has been added, as in created, by the status “A”. When he clicks on the git_demo.Rmd file, he can see the things he deleted in red and the things he inserted in green. This is called a diff and is used to compare files, which is especially useful in comparing commits. He stages both files to be comitted, gives a meaningful commit message, commits, then pushes the commit.
Pat’s Commit
Now that Pat has pushed his work, we want to pull from the
remote repo so we can add our own work. To do this go back to the
GIT drop down menu, and this time click the
Pull Branches option.
Pulling a Change
If all went smoothly, you should get a confirmation that looks similar to this:
Pull Confirmation
Great, now we have all of Pat’s work and can add our own work.
Git will work perfectly if you ping pong files back and forth like this, but this is no different than simply emailing updated files back and forth. Git is more powerful than that and let’s us work on the same file at the same time. But doing this introduces a new problem: how do you combine, or merge two versions of file?
The remedy is simple in cases where the changes are in different parts of the document. For example, Pat edits the intro and you add a conclusion. Git will see these two changes don’t overlap and will simply add both sections to the final document. This is called auto-merging.
What happens if you and Pat edit the same part of the file? For example, you both edit line 17 in different manners. What should Git do in this case? Does one supersede the other? Maybe both changes are needed? In this case, Git can’t know what to do, so it is up to you, or Pat, to manually pick what is wanted in the final document.
Let’s set up an example to show this in practice.
At the start, both you and Pat have the exact same work, i.e., your repos are up-to-date with the remote repo and there is nothing else to commit/push. Pat then writes an intro and adds a code block to his section before the conclusion. His work looks like this:
Pat’s New Work
Pat knits his work, commits it, and then pushes it too the remote repo. Here are all his changes highlighted by the commit window’s diff visualizer:
Pat’s Changes Highlighted
You are working on the same file. You write a conclusion and add some code with interpretation to Pat’s section. You knit and commit your work. Here are your changes:
Your Work
You did great work, but before you are able to push, Pat pushes his work. Now you are behind the remote repo and need to pull before you can push. However, your local repo and the remote repo have “diverged”. This means there were edits to same file and Git needs to reconcile those two file versions.
To fix this, let’s pull from origin/main. We’ll do this from the commit window.
Pull from the Commit Window
When you do this, you’ll get a message:
Pull Conflict Message
Git can’t pull from the remote repo because it doesn’t know how to
combine the divergent files. The pull will fail until we tell Git how to
reconcile them. There are a couple of ways to do this, but we’ll focus
on the first: merge, the method described above.
To tell Git that we want to reconcile divergence with merging, we need to set a configuration option in the terminal. This is similar to setting our email in the earlier section, however this time we won’t set it as a global setting. This means that for every project, you’ll need to set this command.
In your terminal, type in git config pull.rebase false
and press enter:
Configuring Merge
Now we can try pulling again. This time we get a new message:
Auto Merge Conflict
This message is telling us that Git tried to automatically merge the two versions of the two files but ran into a conflict. Remember that for merging, if the same line number are changed, there is no automatic behavior. We will have to specify what to do in these cases.
Let’s take a look at our Rmd file.
Pat’s Intro
Your Conclusion
We can see that Git was able to successfully merge the top and bottom of our file. We didn’t add anything at the top, so Pat’s work could be easily added. And Pat didn’t add anything at the bottom, so our work was easily preserved.
In the middle, we have something different:
Merge Conflict
This is the section where there is a merge conflict. Both Pat and you changed the middle of the document. Now we have to decide what is saved, removed, and what order it is placed.
There are some symbols that help us know what’s going on. At the top
we have <<<<<<< HEAD, in the middle we
have =======, and at the end we have
>>>>>>> 9a876ac1cb79f0e691584f54c7b02afd6c0394f4.
The equal signs separates the two version we are trying to merge. Above the equal signs is our work which is coming from “HEAD”. “HEAD” in this case is just saying our local and current branch. Everything bellow the equals signs is coming from the remote repo and specifically commit 9a876ac1cb79f0e691584f54c7b02afd6c0394f4. This is the identifying hash of the last commit we are merging with.
Now it is our job to manually merge the two sections, remembering to remove those arrows and equals signs. We can choose the delete one section completely, leave them in that order and just remove the symbols, or move them in any other order we want. We may even have to write a bit to make the two version to work together. It’s best to keep this merge commit as simple as possible; don’t go adding a whole new unrelated section.
Let’s keep both sections, but move our section beneath Pat’s. This is what our Rmd file looks like now:
Our Merge Conflict Resolution
Remember to reknit your work, because our html also has a merge conflict. The best way to fix those conflicts will be to knit you work when you’re done fixing the merge conflict in your Rmd files. After adding the two files to our commit, here is what our changes look like:
Commiting the Merge Resolution
Remember to give your commit a good message. For this commit, we were merging, so put that in your message. Commit and then push.
Awesome! You’ve just resolved a file that was changed by both you and Pat at the same time. Now Pat can pull your work and continue the project.
Git and GitHub are powerful tools that can help you manage your code and collaborate with others. In this tutorial, we covered the basics of Git and GitHub, including how to create a repository, commit changes, merge commits, and push those changes to GitHub. We also covered how to do all that using the integrated Git tools within RStudio. With these tools at your disposal, you’ll be able to work more efficiently and effectively in a team. If you run into any issues while working with Git and GitHub, don’t hesitate to use the internet to troubleshoot. Stack Overflow has tons of answers to problems you might encounter. Good luck on your projects!